Sublinear Time Approximation of Text Similarity Matrices
نویسندگان
چکیده
We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a matrix n data points requires Omega(n^2) computations. This quadratic scaling is significant bottleneck, especially when similarities are computed via expensive functions, e.g., transformer models. Approximation methods reduce this complexity, often by using small subset of exactly to approximate the remainder complete matrix. Significant work focuses on efficient approximation positive semidefinite (PSD) matrices, which kernel methods. However, much less understood about indefinite (non-PSD) NLP. Motivated observation many these still somewhat close PSD, we introduce generalization popular Nystrom method setting. Our algorithm can be applied any and runs sublinear time size matrix, producing rank-s with just O(ns) show our method, along simple variant CUR decomposition, performs very well variety arising NLP tasks. demonstrate high accuracy approximated tasks document classification, sentence similarity, cross-document coreference.
منابع مشابه
Text Indexing and Searching in Sublinear Time
We introduce the first index that can be built in o(n) time for a text of length n, and also queried in o(m) time for a pattern of length m. On a constant-size alphabet, for example, our index uses O(n log n) bits, is built in O(n/ log n) deterministic time, and finds the occ pattern occurrences in time O(m/ logn + √ logn log logn + occ), where ε > 0 is an arbitrarily small constant. As a compa...
متن کاملSublinear Approximation of Signals
It has recently been observed that sparse and compressible signals can be sketched using very few nonadaptive linear measurements in comparison with the length of the signal. This sketch can be viewed as an embedding of an entire class of compressible signals into a low-dimensional space. In particular, d-dimensional signals with m nonzero entries (m-sparse signals) can be embedded in O(m log d...
متن کاملSublinear Graph Approximation Algorithms
Motivation Want to learn a combinatorial parameter of a graph: the maximum matching size the independence number α(G), the minimum vertex cover size, the minimum dominating set size Krzysztof Onak – Sublinear Graph Approximation Algorithms – p. 2/32 Motivation Want to learn a combinatorial parameter of a graph: the maximum matching size the independence number α(G), the minimum vertex cover siz...
متن کاملImproved Approximation Guarantees for Sublinear-Time Fourier Algorithms
In this paper modified variants of the sparse Fourier transform algorithms from [32] are presented which improve on the approximation error bounds of the original algorithms. In addition, simple methods for extending the improved sparse Fourier transforms to higher dimensional settings are developed. As a consequence, approximate Fourier transforms are obtained which will identify a near-optima...
متن کاملSublinear-Time Approximation for Clustering Via Random Sampling
In this paper we present a novel analysis of a random sampling approach for three clustering problems in metric spaces: k-median, min-sum k-clustering, and balanced k-median. For all these problems we consider the following simple sampling scheme: select a small sample set of points uniformly at random from V and then run some approximation algorithm on this sample set to compute an approximati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i7.20779